12 research outputs found
Data Discovery and Anomaly Detection using Atypicality.
Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017
Pattern-Based Analysis of Time Series: Estimation
While Internet of Things (IoT) devices and sensors create continuous streams
of information, Big Data infrastructures are deemed to handle the influx of
data in real-time. One type of such a continuous stream of information is time
series data. Due to the richness of information in time series and inadequacy
of summary statistics to encapsulate structures and patterns in such data,
development of new approaches to learn time series is of interest. In this
paper, we propose a novel method, called pattern tree, to learn patterns in the
times-series using a binary-structured tree. While a pattern tree can be used
for many purposes such as lossless compression, prediction and anomaly
detection, in this paper we focus on its application in time series estimation
and forecasting. In comparison to other methods, our proposed pattern tree
method improves the mean squared error of estimation
Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data
The aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting„ parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We develop the information-theoretic methodology for a number of “universal„ signal processing models, and finally apply them to recorded hydrophone data and heart rate variability (HRV) signal
A Pattern Dictionary Method for Anomaly Detection
In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a test data sequence. The proposed pattern dictionary method uses a measure of complexity of the test sequence as an anomaly score that can be used to perform stand-alone anomaly detection. We also show that when combined with a universal source coder, the proposed pattern dictionary yields a powerful atypicality detector that is equally applicable to anomaly detection. The pattern dictionary-based atypicality detector uses an anomaly score defined as the difference between the complexity of the test sequence data encoded by the trained pattern dictionary (typical) encoder and the universal (atypical) encoder, respectively. We consider two complexity measures: the number of parsed phrases in the sequence, and the length of the encoded sequence (codelength). Specializing to a particular type of universal encoder, the Tree-Structured Lempel–Ziv (LZ78), we obtain a novel non-asymptotic upper bound, in terms of the Lambert W function, on the number of distinct phrases resulting from the LZ78 parser. This non-asymptotic bound determines the range of anomaly score. As a concrete application, we illustrate the pattern dictionary framework for constructing a baseline of health against which anomalous deviations can be detected
Learning Using Concave and Convex Kernels: Applications in Predicting Quality of Sleep and Level of Fatigue in Fibromyalgia
Fibromyalgia is a medical condition characterized by widespread muscle pain and tenderness and is often accompanied by fatigue and alteration in sleep, mood, and memory. Poor sleep quality and fatigue, as prominent characteristics of fibromyalgia, have a direct impact on patient behavior and quality of life. As such, the detection of extreme cases of sleep quality and fatigue level is a prerequisite for any intervention that can improve sleep quality and reduce fatigue level for people with fibromyalgia and enhance their daytime functionality. In this study, we propose a new supervised machine learning method called Learning Using Concave and Convex Kernels (LUCCK). This method employs similarity functions whose convexity or concavity can be configured so as to determine a model for each feature separately, and then uses this information to reweight the importance of each feature proportionally during classification. The data used for this study was collected from patients with fibromyalgia and consisted of blood volume pulse (BVP), 3-axis accelerometer, temperature, and electrodermal activity (EDA), recorded by an Empatica E4 wristband over the courses of several days, as well as a self-reported survey. Experiments on this dataset demonstrate that the proposed machine learning method outperforms conventional machine learning approaches in detecting extreme cases of poor sleep and fatigue in people with fibromyalgia